Skip to content

Restart psi-secrets after refresh so serve reloads fresh cache#35

Merged
jdoss merged 1 commit intomasterfrom
fix/refresh-restarts-serve
Apr 17, 2026
Merged

Restart psi-secrets after refresh so serve reloads fresh cache#35
jdoss merged 1 commit intomasterfrom
fix/refresh-restarts-serve

Conversation

@jdoss
Copy link
Copy Markdown
Contributor

@jdoss jdoss commented Apr 17, 2026

Summary

Every time psi-{provider}-refresh.timer fires, setup re-registers secrets via delete+create through the Podman API, which assigns fresh hex IDs. Setup writes those new IDs to the on-disk cache file and the prune step from PR #32 drops the old entries. But serve holds the OLD cache in memory from its last startup and never picks up the new file state — so every lookup after the first refresh goes straight to the provider, and the cache does no work until an operator manually restarts psi-secrets.

Observed on the test server

1554 secret lookups over 30 minutes, zero cache hits. All source: provider. The refresh timer had fired 7 minutes earlier and silently broke the cache. Victoria Logs flagged it via the sheer volume of provider-source events.

Fix

Add a second ExecStart to the refresh wrapper that runs systemctl try-restart psi-secrets.service after setup completes. try-restart is a no-op if serve is not currently active, so this is safe on hosts that have intentionally stopped psi-secrets.

There is a brief (~30s on HSM) lookup-fails-to-cache window during the serve restart, but this happens at most once per cache.refresh_interval (default 1h) instead of never.

Test plan

  • pytest tests/test_unitgen.py — new regression test test_restarts_psi_secrets_so_serve_reloads_the_fresh_cache; all 52 unitgen tests pass.
  • ruff check / ty check — clean.
  • Deploy to test server, run psi systemd install to regenerate the wrapper, fire the refresh, confirm Victoria Logs shows source: cache entries going forward.

Remaining issue (separate PR)

Victoria Logs classifies all PSI log entries as level: error because loguru writes to stderr and conmon maps stderr → PRIORITY: 3. Separate from this PR — a follow-up will split INFO/DEBUG to stdout (PRIORITY 6).

Every time psi-{provider}-refresh.timer fires, setup re-registers
secrets via delete+create through the Podman API, which assigns fresh
hex IDs. Setup writes those new IDs to the on-disk cache file and the
prune step from PR #32 drops the old entries. But serve holds the
OLD cache in memory from its last startup and never picks up the new
file state — so every lookup after the first refresh goes straight to
the provider, and the cache does no work until an operator manually
restarts psi-secrets.

Observed on the test server: 1554 secret lookups over 30 minutes, zero
cache hits. All source=provider. The refresh timer had fired 7
minutes earlier and silently broke the cache.

Add a second ExecStart to the refresh wrapper that runs systemctl
try-restart psi-secrets.service after setup completes. try-restart is
a no-op if serve is not currently active, so this is safe on hosts
that have intentionally stopped psi-secrets.

There is a brief (~30s on HSM) lookup-fails-to-cache window during
the serve restart, but this happens at most once per
cache.refresh_interval (default 1h) instead of never.
@jdoss jdoss merged commit 89e373c into master Apr 17, 2026
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant